5 kinds of NoSQL

Froscon 2020

Henrik Ingo, Datastax

Any element with the class="notes" will not be displayed. This can be used for speaker notes. In fact, the impressConsole plugin will show it in the speaker console!

RDBMS

NoSQL

Key-Value

It's fast because...

It's simple
It's in RAM
It's denormalized
Can use hash index
Hash based sharding

redis 127.0.0.1:6379> SET name "Henrik"
OK 
redis 127.0.0.1:6379> GET name 
"Henrik"
redis 127.0.0.1:6379> SET age "43"
OK 
redis 127.0.0.1:6379> GET age
"43"

Use cases

Cache. Session cache.

In-memory, low latency computing. (Write heavy.)

Recommendation engines & Machine Learning.

Queue

One more thing...

Redis complex data types: lists, sets, maps and streams.

Wide-Column

What does it do???

Tables, rows & columns.
All data access uses Primary Key
PK can be composite:
Partition Key + Clustering Keys
Partition Key is required

CREATE TABLE people (id UUID PRIMARY KEY, firstname text, lastname text);

INSERT INTO people (id, lastname, firstname) 
    VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'Ingo','Henrik');

SELECT lastname, firstname FROM people WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Use cases

Large (aka Web Scale) +100TB databases

Write optimized storage engine

Write availability (Dynamo HA)

One more thing...

Useful secondary indexes: See Cassandra 4.0 and DataStax Enterprise 6.8.3.

Document

What does it do???

Records are JSON or XML
Flexible schema:
Structure but not fixed
Secondary indexes, complex queries, transactions

> db.somecollection.insert({firstname: "Henrik", lastname: "Ingo", age: 42})
> db.somecollection.createIndex({lastname:1, firstname:1});

> db.somecollection.find({lastname: "Henrik"})
{_id: ObjectId("507f1f77bcf86cd799439011"), firstname: "Henrik", 
lastname: "Ingo", age: 42}

Use cases

General purpose database. Competes with RDBMS.

Main selling points compared to relational:
JSON API, flexible schema, sharding.

Flexible schema strengths: Data hub.

What does the future look like...

Incremental innovation? Performance, GUI tools, integrations, SDKs...

Graph

What does it do???

Records are nodes, connected by edges
Both can have properties
Indexes enable queries

gremlin> graph = TinkerFactory.createModern()
==> tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==> graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name','marko').out('knows').values('name')
==> 'vadas'
==> 'josh'

Use cases

Analytical. Find friends of friends that own a cat

Social media, recommendation engines, etc.

National security

One more thing...

Gremlin, Cypher, GraphQL

OLTP graph databases exist. (Datastax)

Interesting unsolved problem: Optimal sharding for graph DBs.

Query Engine

What does it do???

Formerly known as Hadoop
"Batch" queries
But also interactive
Data stored in HDFS, S3, Cassandra, MongoDB...

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Froscon demo").getOrCreate()
import spark.implicits._
val df = spark.read.json("people.json")
df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
+-----------+----------+-----+
| firstname | lastname | age |
+-----------+----------+-----+
| Henrik    | Ingo     | 43  |
+-----------+----------+-----+

Use cases

Data lake. S3.

Personalized user profile

Fraud detection, national security...

"Reporting"

One more thing...

Spark Streaming (mini-batch)

AWS Athena = Presto

Search

What does it do???

Search for words
Ranked results
Faceting, highlighting

> curl -POST http://localhost:9200/froscon/people/id1 -curl 
 -H 'Content-Type: application/json' -d '{"name":"Henrik Ingo"}'

> curl -XGET localhost:9200/froscon/_search?q=name:Ingo

[{_index: "froscon", _type: "people", _id: id1, _source:
{_id: "name":"Henrik Ingo"}

Use cases

Google for your website

Queries beyond the typical RDBMS BTree

Kibana analytics

Security monitoring

One more thing...

Elastic = MongoDB in size

Until 2018

	Apache/BSD	*GPL	open core	proprietary
Key-Value	Memcache		Redis
Wide Col	Cassandra			BigTable, DynamoDB
Document	MongoDB			MarkLogic
Graph		Neo4j	DSE Graph
Query Eng	Spark, Presto			Athena
Search	Lucene, Solr		Elastic

After 2018

	Apache/BSD	*GPL	open core	proprietary
Key-Value	Memcache	Redis		~~Redis~~
Wide Col	Cassandra			BigTable, DynamoDB
Document				MongoDB
Graph			Neo4j
Query Eng	Spark, Presto			Athena
Search	Lucene, Solr, Open Distro	Elastic

Image credits:

jay~dee @ Flickr